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Abstract 

We give the first non-trivial upper bounds on the average sensitivity and noise 
sensitivity of degree-d polynomial threshold functions (PTFs). These bounds hold 
both for PTFs over the Boolean hypercube {—1, 1}" and for PTFs over under the 
standard n-dimensional Gaussian distribution AA(0, Our bound on the Boolean 
average sensitivity of PTFs represents progress towards the resolution of a conjecture 
of Gotsman and Linial [GL94], which states that the symmetric function slicing the 
middle d layers of the Boolean hypercube has the highest average sensitivity of all 
degree-d PTFs. Via the Li polynomial regression algorithm of Kalai et al. [KKMS08], 
our bounds on Gaussian and Boolean noise sensitivity yield polynomial-time agnostic 
learning algorithms for the broad class of constant-degree PTFs under these input 
distributions. 

The main ingredients used to obtain our bounds on both average and noise sensitiv- 
ity of PTFs in the Gaussian setting are tail bounds and anti-concentration bounds on 
low-degree polynomials in Gaussian random variables [Jan97, CWOl]. To obtain our 
bound on the Boolean average sensitivity of PTFs, we generalize the "critical-index" 
machinery of [Ser07] (which in that work applies to halfspaces, i.e. degree-1 PTFs) 
to general PTFs. Together with the "invariance principle" of [MOO05], this lets us 
extend our techniques from the Gaussian setting to the Boolean setting. Our bound 
on Boolean noise sensitivity is achieved via a simple reduction from upper bounds on 
average sensitivity of Boolean PTFs to corresponding bounds on noise sensitivity. 
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1 Introduction 



A degree-rf polynomial threshold function (PTF) over a domain X C is a Boolean-valued 
function f : X +1)7 

f{x) = sign(p(xi, . . .,Xn)) 

where p : X ^ M is a degree-d polynomial with real coefficients. When d = 1 polynomial 
threshold functions are simply linear threshold functions (also known as halfspaces or LTFs), 
which play an important role in complexity theory, learning theory, and other fields such 
as voting theory. Low-degree PTFs (where d is greater than 1 but is not too large) are a 
natural generalization of LTFs which are also of significant interest in these fields. 

Over more than twenty years much research effort in the study of Boolean functions has been 
devoted to different notions of the "sensitivity" of a Boolean function to small perturbations 
of its input, see e.g. [KKL88, BT96, BK97, Fri98, BKS99, ShiOO, MOOS, MOO05, OSSS05, 
OS07] and many other works. In this work we focus on two natural and well-studied measures 
of this sensitivity, the "average sensitivity" and the "noise sensitivity." As our main results, 
we give the first non-trivial upper bounds on average sensitivity and noise sensitivity of 
low-degree PTFs. These bounds have several applications in learning theory and complexity 
theory as we describe later in this introduction. 

We now define the notions of average and noise sensitivity in the setting of Boolean functions 
/ : {—1, 1}" — > {—1, !}• (Our paper also deals with average sensitivity and noise sensitivity 
of functions / : M" — > {—1, 1} under the Gaussian distribution, but the precise definitions 
are more involved than in the Boolean case so we defer them until later.) 

1.1 Average Sensitivity and Noise Sensitivity 

The sensitivity of a Boolean function / : { — 1,1}" { — 1,1} on an input x G { — 1,1}", 
denoted Sf{x), is the number of Hamming neighbors y G { — 1, 1}" of x (i.e. strings which 
differ from x in precisely one coordinate) for which f{x) ^ f{y)- The average sensitivity of 
/, denoted AS(/), is simply E[s/(x)] (where the expectation is with respect to the uniform 
distribution over {—1,1}"). An alternate definition of average sensitivity can be given in 
terms of the infiuence of individual coordinates on /. For a Boolean function / : { — 1, 1}" — > 
{ — 1,1} and a coordinate index z G [n], the influence of coordinate i on f is the probability 
that fiipping the z-th bit of a uniform random input x G { — 1, 1}" causes the value of / to 
change, i.e. Infj(/) = Pr[/(x) 7^ /(a;®*)] (where the probability is with respect to the uniform 
distribution over {—1, 1}"). The sum of all n coordinate infiuences, 'Y27=i^^^iif)^ called 
the total influence of /; it is easily seen to equal AS(/). Bounds on average sensitivity have 
been of use in the structural analysis of Boolean functions (see e.g. [KKL88, Fri98, ShiOO]) 
and in developing computationally efficient learning algorithms (see e.g. [BT96, OS07]). 

The average sensitivity is a measure of how / changes when a single coordinate is perturbed. 
In contrast, the noise sensitivity of / measures how / changes when a random collection 
of coordinates are all perturbed simultaneously. More precisely, given a noise parameter 
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< e < 1 and a Boolean function / : {—1, 1}"" {^1? l}? the noise sensitivity of f at noise 
rate e is defined to be 

NS,(/) = Pr.,,[/(x) ^ fiy)] 

where x is uniform from {—1, 1}" and y is obtained from x by flipping each bit independently 
with probability e. Noise sensitivity has been studied in a range of contexts including Boolean 
function analysis, percolation theory, and computational learning theory [BKS99, KOS04, 
MOOS, SS, KOS08]. 

1.2 Main Results: Upper Bounds on Average Sensitivity and 
Noise Sensitivity 

1.2.1 Boolean PTFs 

In 1994 Gotsman and Linial [GL94] conjectured that the symmetric function slicing the 
middle d layers of the Boolean hypercube has the highest average sensitivity among all 
degree-d PTFs. Since this function has average sensitivity Q{dy/n) for every 1 < < ^/n, 
this conjecture implies (and is nearly equivalent to) the conjecture that every degree-d PTE 
/ over {-1, 1}" has AS(/) < dy^. 

Our first main result is an upper bound on average sensitivity which makes progress toward 
this conjecture: 

Theorem 1.1 For any degree-d PTF f over {—1, 1}"", we have 

AS(/) <2«W-logn-n^-^/("'^+2)_ 

Using a completely different set of techniques, we also prove a different bound which improves 
on Theorem 1.1 for d < 4: 

Theorem 1.2 For any degree-d PTF f over { — 1, 1}", we have 

AS(/) < 2n'~'/^\ 

We give a simple reduction which translates any upper bound on average sensitivity for 
degree-d PTFs over Boolean variables into a corresponding upper bound on noise sensitivity. 
Combining this reduction with Theorems 1.1 and 1.2, we establish: 

Theorem 1.3 For any degree-d PTF f over { — 1, 1}" and any < e < 1, we have 

NS,(/) < 2^('^) ■ei/(^'^+2)log(l/e) 
NS,(/) < 0{e'/''). 
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1.2.2 Gaussian PTFs 



Looking beyond tlie Boolean liypercube, there are well-studied notions of average sensitivity 
and noise sensitivity for Boolean- valued functions over M", where we view as endowed 
with the standard multivariate Gaussian distribution A/'(0, /„) [Bog98, MOO05]. Let GAS(/) 
denote the Gaussian average sensitivity of a function / : ^ {—1,1}, and let GNSe(/) 
denote the Gaussian noise sensitivity at noise rate e. (See Section 2 for precise definitions 
of these quantities; here we just note that these are natural analogues of their uniform- 
distribution Boolean hypercube counterparts defined above.) We prove an upper bound on 
Gaussian average sensitivity of low-degree PTFs: 

Theorem 1.4 For any degree-d PTF f over M", we have 

GAS(/) < 0{d^ -log n-n'-'/^'^). 

We remark that in the case of degree-ci multilinear PTFs it is possible to obtain a slightly 
stronger bound of GAS(/) < 0{d ■ \ogn ■ n^~^^'^'^) using our approach. We also prove an 
upper bound on the Gaussian noise sensitivity of degree-d PTFs: 

Theorem 1.5 For any degree-d PTF f over M" and any < e < 1, we have 

GNS,(/)<0(rf-log^/2(l/e)-ei/2d). 

1.3 Application: agnostically learning constant-degree PTFs in 
polynomial time 

Our bounds on noise sensitivity, together with machinery developed in [KOS04, KKMS08, 
KOS08], let us obtain the first efficient agnostic learning algorithms for low-degree polynomial 
threshold functions. In this section we state our new learning results; details are given in 
Section 8. 

We begin by briefly reviewing the fixed-distribution agnostic learning framework that has 
been studied in several recent works, see e.g. [KKMS08, KOS08, BOW08, GKK08, KMV08, 
SSS09]. Let Vx be a (fixed, known) distribution over an example space X such as the uniform 
distribution over { — 1, 1}" or the standard multivariate Gaussian distribution A/'(0, /„) over 
W\ Let C denote a class of Boolean functions, such as the class of all degree-c/ PTFs. An 
algorithm A is said to be an agnostic learning algorithm for C under distribution Vx if 
it has the following property: Let V be any distribution over X x {—1,1} such that the 
marginal of V over X is Vx- Then if A is run on a sample of labeled examples drawn 
independently from V, with high probability A outputs a hypothesis h : X ^ 1} such 
that Pr(^^^y)^v[h{x) ^ y] < opt + e, where opt = miu/gc Pr(x,y)~D[/(a;) ^ y]- In words, A's 
hypothesis is nearly as accurate as the best hypothesis in C. 
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Kalai et al. [KKMS08] gave an Li polynomial regression algorithm and showed that it can 
be used for agnostic learning. More precisely, they showed that for a class C of functions and 
a distribution V, if every function in C has a low-degree polynomial approximator (in the 
L2 norm) under the marginal distribution Vx, then the Li polynomial regression algorithm 
is an efficient agnostic learning algorithm for C under T>x- They used this Li polynomial 
regression algorithm together with the existence of low-degree polynomial approximators for 
halfspaces (under the uniform distribution on { — 1, 1}" and the standard Gaussian distri- 
bution J\f{0, In) on M") to obtain n^^^/^ ^-time agnostic learning algorithms for halfspaces 
under these distributions. 

Using ingredients from [KOS04], one can easily convert upper bounds on Boolean noise 
sensitivity (such as Theorem 1.3) into results asserting the existence of low-degree L2-norm 
polynomial approximators under the uniform distribution on {—1, 1}". We thus obtain the 
following agnostic learning result (a more detailed proof is given in Section 8): 

Theorem 1.6 The class of degree-d PTFs is agnostically learnable under the uniform dis- 
tribution on {—1, 1}" in time 

^20(d^)(logl/e)«+2/^8d+4 

For d < 4, this bound can be improved to n '^^''^ > . 

Similarly, using ingredients from [KOS08], one can easily convert upper bounds on Gaussian 
noise sensitivity (such as Theorem 1.5) into results asserting the existence of low-degree 
L2-norm polynomial approximators under A/'(0, J„). This lets us obtain 

Theorem 1.7 The class of degree-d PTFs is agnostically learnable under any n- dimensional 
Gaussian distribution in time n^'^/^'>°^'^^ . 

For e constant, these results are the first polynomial-time agnostic learning algorithms for 
constant-degree PTFs. 

1.4 Other applications 

The results and approaches of this paper have found other recent applications beyond the 
agnostic learning results presented above; we describe two of these below. 

Gopalan and Servedio [GS09] have combined the average sensitivity bound given by Theorem 1.1 
with techniques from [LMN93] to give the first sub-exponential time algorithms for learning 
AC^ circuits augmented with a small (but super-constant) number of arbitrary threshold 
gates, i.e. gates that compute arbitrary LTFs which may have weights of any magnitude. 
(Previous work using different techniques [JKS02] could only handle AC^ circuits augmented 
with majority gates.) 
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In other recent work Diakonikolas et al. [DSTW09] have refined the approach used to 
prove Theorem 1.1 to estabhsh a "regularity lemma" for low-degree polynomial threshold 
functions. Roughly speaking, this lemma says that any degree-c? PTF can be decomposed 
into a constant number of subfunctions, almost all of which are "regular" degree-d PTFs. 
[DSTW09] apply this regularity lemma to extend the positive results on the existence of 
low- weight approximators for LTFs, proved in [Ser07], to low-degree PTFs. 

Related work. Simultaneously and independently of this work, Harsha et al. [HKM09] 
have obtained very similar results on average sensitivity, noise sensitivity, and agnostic learn- 
ing of low-degree PTFs using techniques very similar to ours. 

1.5 Techniques 

In this section we give a high-level overview of how Theorems 1.1, 1.4 and 1.5 are proved. 
(As mentioned earlier. Theorem 1.2 is proved using completely different techniques; see 
Section 6.) The arguments are simpler for the Gaussian setting so we begin with these. 

1.5.1 The Gaussian case 

We sketch the argument for the Gaussian noise sensitivity bound Theorem 1.5; the Gaussian 
average sensitivity bound Theorem 1.4, follows along similar lines 

Let / = sign(p) where p : M" — ^ M is a degree-c? polynomial. The Gaussian noise sensitivity 
GNSe(/) of / is equal to Pr x^y[f{x) ^ /(y)] where x is distributed according to A/'(0, /„) and 
y is an "e-perturbed" version of x (see Section 2 for the precise definition). Intuitively, the 
event f{x) ^ f{y) can only take place if either 

• X lies close to the boundary of p, i.e. \p{x)\ is "small", or 

• \p{^) ~ p{y)\ is "large". 

We use an anti-concentration result for polynomials in Gaussian random variables, due to 
Carbery and Wright [CWOl], to show that \p{x)\ is "small" only with low probability. For 
the second bullet, it turns out that p{x) —p{y) can be expressed as a low-degree polynomial in 
independent Gaussian random variables, and thus we can apply tail bounds for this setting 
[Jan97] to show that \p{x) — p{y) \ is "large" only with low probability. We can thus argue 
that Prr^^y[f{x) 7^ f{y)] is low, and bound the Gaussian noise sensitivity of /. (We note that 
this high-level explanation glosses over some significant technical issues. In particular, since 
we are dealing with general degree-d PTFs which may not be multilinear, it is nontrivial to 
establish the conditions that allow us to apply the tail bound; see the proof of Claim 4.1 in 
Section 4.1.) 
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1.5.2 The Boolean case 



One advantage of working over the Boolean domain { — 1, 1}" is that without loss of generality 
we may consider only multilinear PTFs, where / = sign(p(x)) for p a multilinear polynomial. 
However, this advantage is offset by the fact that the uniform distribution on { — 1,1}" is 
less symmetric than the Gaussian distribution; for example, every degree-1 PTF under the 
Gaussian distribution A/"" is equivalent simply to sign(xi — 9), but this is of course not true 
for degree-1 PTFs over {—1, 1}". Our upper bound on Boolean average sensitivity uses ideas 
from the Gaussian setting but also requires significant additional ingredients. 

An important notion in the Boolean case is that of a "regular" PTF; this is a PTF / = sign(p) 
where every variable in the polynomial p has low influence. (See Section 2 for a definition of 
the influence of a variable on a real- valued function; note that the definition from Section 1.1 
applies only for Boolean- valued functions.) If / is a regular PTF, then the "invariance 
principle" of [MOO05] tells us that p{x) (where x is uniform from {—1, 1}") behaves much 
like p{G) (where Q is drawn from A/'(0, In)), and essentially the arguments from the Gaussian 
case can be used. 

It remains to handle the case where / is not a regular PTF, i.e. some variable has high 
influence in p. To accomplish this, we generalize the notion of the "critical-index" of a 
halfspace (see [Ser07, DGJ"'"09]) to apply to PTFs. We show that a carefully chosen random 
restriction (one which fixes only the variables up to the critical index - very roughly speaking, 
only the highest- influence variables - and leaves the other ones free) has non-negligible 
probability of causing / to collapse down to a regular PTF. This lets us give a recursive 
bound on average sensitivity which ends up being not much worse than the bound that can 
be obtained for the regular case; see Section 5.1 for a detailed explanation of the recursive 
argument. 

1.6 Organization 

Formal definitions of average sensitivity and noise sensitivity (especially in the Gaussian 
case), and tail bounds and anticoncentration results for low degree polynomials are presented 
in Section 2. In Section 3, we show an upper bound on the Gaussian average sensitivity 
of PTFs (Theorem 1.4). Upper bounds on Gaussian noise sensitivity (Theorem 1.5) are 
obtained in the section that follows (Section 4). 

The main result of the paper - a bound on the Boolean average sensitivity (Theorem 1.1) - 
is proved in Section 5. In Section 6, an alternate bound for Boolean average sensitivity that 
is better for degrees d < A (Theorem 1.2) is shown. This is followed by a reduction from 
Boolean average sensitivity bounds to corresponding noise sensitivity bounds (Theorem 7.1) 
in Section 7. We present the applications of these upper bounds to agnostic learning of 
PTFs in Section 8. Section 9 concludes by proposing a direction for future work towards the 
resolution of the Gotsman-Linial conjecture. 
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2 Definitions and Background 



2.1 Basic Definitions 

In this subsection we record the basic notation and definitions used throughout the paper. 
For n G we denote by [n] the set {1, 2, . . . , n}. We write M to denote the standard 
univariate Gaussian distribution A/'(0, 1). 

For a degree-c? polynomial p : X — > M we denote by ||p||2 its I2 norm, ||p||2 = E^.[p(x)^]^/^, 
where the intended distribution over x G (which will always be either uniform over 
{— 1, 1}", or the A/"" distribution) will always be clear from context. We note that for 
multilinear p the two notions are always equal (see e.g. Proposition 3.5 of [MOO05]). 

We now proceed to define the notion of influence for real-valued functions in a product 
probability space. Throughout this paper we consider either the uniform distribution on the 
hypercube {±1}" or the standard n-dimensional Gaussian distribution in W^. However, for 
the sake of generality, we adopt this more general setting. 

Let (fii, /ii), . . . , (fin, Hn) be probabflity spaces and let (fi = ®"=ifii, /i = denote the 

corresponding product space. Let / : f2 — >■ R be any square integrable function on (fi,/u), 
i.e. / G L^{Q,fi). The influence of the ith coordinate on / [MOO05] is 

Inff(/)^=^^E,[Var,J/]] 
and the total influence of / is Inf'^(/) =^ J27=i ^^^iif)- 

For a function / :{— 1,1}"^]R over the Boolean hypercube endowed with the uniform dis- 
tribution, the influence of variable i on f can be expressed in terms of the Fourier coefficients 
of / as, 

Inf,(/) = 5^/(5f, 

and as mentioned in the introduction it is easily seen that AS(/) = Inf (/) for Boolean- valued 
functions / : {-1, 1}" {-1, 1}. 

In this paper we are concerned with variable infiuences for functions defined over {—1, 1}"" un- 
der the uniform distribution, and over M" under A/'(0, /„); we shall adopt the convention that 
Infj(/) denotes the former and GIj(/) the latter. We also denote by GAS(/) = J2i£[n] Gfli(/) 
the Gaussian average sensitivity. 

Note that for a function / : M"' { — 1,1}, the Gaussian infiuence GIj(/) can be equivalently 
written as: GIj(/) = 2 Pr^, .j,i[/(x) 7^ f{x^)], where x ~ A/"*^ and is obtained by replacing 
the i^^ coordinate of x by an independent random sample from A/". 

We proceed to define the notion of noise sensitivity for Boolean- valued functions in (M", A/""). 
For the domain { — 1, 1}", the notion has been defined already in the introduction. (We 
remark that "noise sensitivity" can be defined in a much more general setting and also for 
real-valued functions; but such generalizations are not needed here.) 
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Definition 1 (Gaussian Noise Sensitivity) Given f : ^ {—1,1}, the "Gaussian 
noise sensitivity of f at noise rate e G [0, 1] " is 

GNS,(/) =^Pr,,,,[/(x)^/(y)]; 

where x ~ A/'"' and y =^ (1 — e) x + \/2e — e"^ z for an independent Gaussian noise vector 

Fourier and Hermite Analysis. We assume familiarity with the basics of Fourier analysis 
over the Boolean hypercube {—1, 1}"". We will also require similar basics of Hermite analysis 
over the space M"" equipped with the standard n-dimensional Gaussian distribution A/""; a 
brief review is provided in Appendix A. 

2.2 Probabilistic Facts 

In this subsection, we record the basic probabilistic tools we use in our proofs. 

We first recall the following well-known consequence of hypercontractivity (see e.g. Lecture 
16 of [O'D07] for the boolean setting and [Bog98] for the Gaussian setting): 

Theorem 2.1 Let p : X ^ ^ he a degree-d polynomial, where X is either { — 1, 1}"" under 
the uniform distribution or M" under M"' , and fix q > 2. Then 

wpwi < {q - ir\\p\\i- 

We will need a concentration bound for low-degree polynomials over independent random 
signs or standard Gaussians. It can be proved (in both cases) using Markov's inequality and 
hypercontractivity, see e.g. [Jan97, O'D07, AH09]. 

Theorem 2.2 ("degree-d Chernoff bound") Let p{x) be a degree-d polynomial. Let x 
be drawn either from the uniform distribution in { — 1, 1}" or from A/"". For any t > e'^, we 
have 

Pr.[b(x)| >t|b||2] <exp(-fi(t2/'^)). 

The second fact is a powerful anti-concentration bound for low-degree polynomials over 
Gaussian random variables. (We note that this result does not hold in the Boolean setting.) 

Theorem 2.3 ([CWOl]) Let p : M" ^ M 6e a degree-d polynomial. Then for all e > 0, we 

have 

Pw4b(^)l <e|Hl2] <o(rfe^/'^). 
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We also make essential use of a (weak) anti-concentration property of low-degree polynomials 
over the hypercube {—1, 1}"': 

Theorem 2.4 ([DFKO06, AH09]) Let p : {-1, 1}" R be a degree-d polynomial with 
Var[j9] = J2o<\s\<dP('^y ~ ^ ^'^^ -^b] ~ P(^) ~ 0- Then we have 

Pr[p(x) > l/2°(^)] > l/2°(^) and h ence Pr[\p{x)\ > 1/2'^^'^^] > 1/2'^^'^^ 

The following is a restatement of the invariance principle, specifically Theorem 3.19 under 
hypothesis H4 in [MOO05]. 

Theorem 2.5 ([MOO05]) Let p{x) = '^^s\<dPi^)^s be a degree-d multilinear polynomial 
with X]o<|5|<(ii^('^)^ ~ Suppose each variable i G [n] has low influence Infj(p) < t, i.e. 
^^g.p(S')^ — Tet X be drawn uniformly from { — 1, 1}" and Q ~ A/"". Then, 

sup I Vr[p{x) <t]- Vr[p{g) <t]\< 0{dT^'^^'^+^^). 

3 Gaussian Average Sensitivity 

In this section we prove an upper bound on the Gaussian average sensitivity of degree-d 
PTFs (Theorem 1.4). 

The following lemma, which relates the influence of a variable on / to its influence on the 
polynomial p, is central to the argument. 

Lemma 3.1 Let p : — M 6e a degree-d polynomial over Gaussian inputs with Var[p] = 1 
and let f = sign(p). Then for each i G [n], 

GUf) < 0{d' . GI.(p)^/(2'^) ■ \og{l/GUp))). 

Proof: [of Lemma 3.1] Let p{x) be a degree-d polynomial where ||p||2 = 1. For notational 
convenience let us fix z = 1 and let r = GIi(p). We may assume that r < 1/4 since 
otherwise the claimed bound holds trivially. We express p{x) as a univariate polynomial in 
xi as follows, 

d 

p{x) = p{Xi, . . . ,Xn) =^Pi{x2, . . . ,Xn) ■ hi{xi) 

i=0 

where hi{xi) is the univariate degree-i Hermite polynomial. Note that for any multi-index 
^ = (^2, • • • , ^n) e N"-i and < i < d, we have Pi{S) = p{S') where S' = (i, ^2, • • • , ^n) e 
N". As a result, using Parseval's identity for the Hermite basis, we have that 

d 

II ii2 II ii2 
Ibll = 2^ \\Pi\\ ■ 

i=0 
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We further have 

56N"-i S:Si>0 

Consequently the 2-norms oi pi, . . . ,pd are "small" and the 2- norm of po is "large": 

d 

E ll^'^ll' = E ^(^)' = Glib) = ^ and IbolP = 1 - r > 1/2. 

i=l S:5i>0 

Let t = C^/^r^/^ log"'/^(l/r) and 7 = rf^ ■ r^/^'' log(l/r) where C is an absolute constant that 
will be defined later in Claim 3.3. We can assume that 7 < 1/10 since otherwise the bound 
of Lemma 3.1 holds trivially. For these values of t and 7, the proof strategy is as follows: 

• We use the "small ball probability" bound (Theorem 2.3) to argue that with high prob- 
ability po{g2, ...,gn)is not too small: more precisely, Pr(g2,...,g„)^Ar"-i [bo(fi'2, • • • , fi-n) | < 
tci(2erflog(l/7))'^/2] < Q(^^^ (ggg Claim 3.2). 

• We use the concentration bound (Theorem 2.2) to argue that with high probability each 
Pi{g2,---,gn),i e [rf], is not too large: more precisely, Pr(g2^...^g„)^^n-i [1^^(5(2, 5f„)| > 
t] < 0(7) (see Claim 3.3). 

• We use elementary properties of the A/'(0, 1) distribution to argue that if \a\ > td{2edlog{l/-f)Y^^ 
and \bi\ < t, then the function sign(a+^^^^ hihi{gi)) (a function of one A/'(0, 1) random 
variable gi) is 0(7)-close to the constant function sign(a) (see Claim 3.5). 

• Thus we know that with probability at least 1 — 0(7) over the choice of g2, ■ ■ ■ ,gn, 
we have Var^^ [sign{p{gi, . . . , (?„))] < ^(7(1 — 7)) < 0(7). For the remaining (at most) 
0(7) fraction of outcomes for g2, gn we always have Var^,^ [sigia{p{gi, . . . , (?„))] < 1, 
so overall we get Gli(sign(p)) < 0(7). 

Thus, to complete the proof of the lemma, it suffices to prove the three aforementioned 
claims. 

Claim 3.2 With probability at least 1—0(7) over draws {g2, ■ ■ ■ , gn) ~ M^~'^ , the polynomial 
Po{g2, ■ ■ ■ ,gn) has magnitude at least td{2edlog{l / . 

Proof: Applying Theorem 2.3 to the polynomial Po{x2, ■ ■ ■ , Xn) we get: 

Pr„ ,„ [\Mm, . . . < M(2edlog(l/oO)*^] < 0(d) • 

Recall that ||po|| > |; and so by our choice of t and 7 it follows that the right hand side is: 

0(rf3/2) . 0{T'/^''\og^/^{l/T) ■ log^/2(l/7)) = 0(7). 
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Claim 3.3 For each i G [d], the polynomial Pi{g2, . . . ,gn) has magnitude larger than t with 
probability at most 'j/d. Therefore, the probability that any Pi{g2, ■ ■ ■ , gn) has magnitude 
larger than t is at most 7. 

Proof: First note tliat since YL'i=i llPiP — '^^ certainly for each i E [d] we have \\pi\\ < y/r. 
Therefore, 

\E[p,]\<E[p^]'^'=\\p,\\<^. 
Let p'i = Pi — E[pj], so E[p'J = 0. Applying Theorem 2.2, we get: 



Pr 



g2,...,9n 



t — 

\p'i{92,---.9n)\ > 11^; II ■ \\Pi\ 



< 2 exp -n 



bill 



Given our bound on < \\pi\\ < ^/t and choice of t, we know that the probability bound 
is at most 2exp(— r2(Clog(l/r))). For a sufficiently large absolute constant C this is at most 
exp(— 41og(l/r)) = < 'y/d. To complete the proof note that if < t — ^/T then certainly 

\Pi\ <t. m 

We will need the following lemma in the proof of Claim 3.5: 

Lemma 3.4 The degree-d Hermite polynomial hd{x), d > 1, satisfies the following bound 
for all x: 

\hd{x)\ < (ed)'^/^ -maxil.lxf}. 



Proof: The lemma is immediate for d = 1. For > 2, we note that the polynomial hd{x) 
has at most d terms, each of which has coefficients of magnitude at most y/dJ. < d'^^^/\fd}.. 
This directly gives \hd{x)\ < (d'^/^/dJ.) -maxjl, Ixl*^}. The claimed equality follows easily from 
this using Stirling's approximation. ■ 



Claim 3.5 Suppose \a\ > tc?(2ecilog(l/7))'^/^ \bi\ < t for all i G [d], and 7 < 1/10. Then, 



Pr 



5i~Ar(0,l) 



sign(a + ^ bihi{gi)) ^ sign( 



i=l 



< 0(7). 



Proof: If sign(a + X]j=i bihi{x)) 7^ sign(a) then it has to be the case that: 



^^bihi^x] 



i=i 



> a 



By Lemma 3.4 we know that for all x, we have 



y^^bjhi 



X] 



i=l 



<td- max \hi{x) \ < t(i(ec?) ' ■ max{l, |x| }. 

l<i<d 
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Now if |x| is at most log(l/7), since 7 < 1/10 we have -\/21og(T77) > 1 and hence it 
follows that 

d 



1=1 



In other words, if sign(a + Yli=i differs from sign(a), it must necessarily be the case 

that \x\ > a/2 log(l/7). The standard tail bound on Gaussians, 

Prgi~Ar(o,i)[^i <c]< J- exp(-cV2) for c < 0, 

V ZttIcI 



completes the proof. ■ 
The proof of Lemma 3.1 is now complete. ■ 
We can now complete the proof of Theorem 1.4. 

Proof: [Proof of Theorem 1.4] Let us denote Glj(p) by for i G [n]. Note that since p is of 
degree d, we have 

ie[n] ie[n] SBi \S\<d 

Let ad{x) = (i^x^/^'^log(l/x). By Lemma 3.1 the average sensitivity of / can be bounded as 

GAS(/) = J2 < 0{J2 

ie[n] ie[n] 

The function ad{x) is monotone increasing and concave in [0, e~^'^]. In this light, we split the 
summation into terms greater than e~^'^ and the rest. Let S = {i\Ti > e~^'^} and T = [n]\ S. 
From (1), we have IS"! < (ie^*^. Observe that for n < {27d^Y'^, Theorem 1.4 holds trivially 
since GAS(/) < n < 27rf^n^"^/^'^ < 27d'^n^~^/'^'^ logn. Hence we may assume n > (27(i^)^'^, 
and consequently |T| is at least n/2. Using concavity and monotonicity of a^, we can write 

$^a,(r,)<|r|-aj (5^r,)/|T|) < na, < 0{d'n'-'/'' logn) . 

Therefore, the average sensitivity of / is bounded by 

GAS(/) = 5^GI,(/) + 5^GI,,(/) 

ies ieT 
<\S\+ ad{Ti)) < rfe^^ + 0(rfV^^/2d iQg^) ^ 



ieT 

For all (i > 1 we have 

^g2d ^ ^ (3rfi/3)3d < ^27dY < n'/', since n > {27d^f'^. 

Consequently we have GAS(/) < n^/"^ + 0{d'^n^~^/'^'^\ogn) = 0{d'^n^~^^'^'^\ogn), and the 
proof is complete. ■ 
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4 Gaussian Noise Sensitivity 



In this section we prove an upper bound on the noise sensitivity of degree-c? PTFs. 

Proof: [of Theorem 1.5] Let / = sign(p), where p = p{xi, . . . degree-d polynomial 

with E^^j^n[p{xyY/'^ = \\p\\2 = 1. Recall that GNS,(/) equals Pr^,^[/(x) ^ f{y)] where 



X 



A/"", z ~ A/""; X and z are independent; and y = ax + f3z, with a =^ 1 — e and 
/3 = V2e-e2. 

We can assume wlog that e < 2"^^"^, since otherwise the theorem trivially holds. 
Let us define the function 

q{x,z) = p{x) -p{y). 
Note that g is a degree-ci polynomial over 2n variables. 

Fix a real number t* > 0. It is easy to see that f{x) ^ f{y) only if at least one of the 
following two events hold: 

(Event £i) \p{x)\ < t* OR (Event £2) \qix,z)\ > f . 

We will upper bound the probability of these two events for a carefully chosen t* . We will 
bound the probability of the event £1 using Carbery- Wright (Theorem 2.3), the probability 
of event £2 using the tail bound for degree-d polynomials (Theorem 2.2) and then apply a 
union bound. 

The choice of t* will be dictated by Theorem 2.2. More precisely, to apply Theorem 2.2, a 
bound on ||g||2 is needed. To this end, we show the following claim: 

Claim 4.1 We have \\q\\2 = 0{d ■ ^/e). 

The proof of this claim is somewhat involved and is deferred to Section 4.1. 
Fix t* = e{dy/e\og'^^^{l/e)). By Theorem 2.3, we have: 

Pr..^"[b(x)| < r] = 0{d ■ (r)^'^) = 0{d ■ e'/^'"^ ■ logi/'(l/e)). 

Since both x and y are individually distributed according to A/"", we have E[g(a;, 2;)] = 
E[p(x) — p{y)] = 0. By Theorem 2.2 and Claim 4.1, we get 



Pr 



t* 

\qix,z)\ > —- ■ \\q\\2 



2/d> 

<2exp|-f]| (^) 1 1 <6. 



Hence, by a union bound the noise sensitivity is 0{d ■ e^/*^^'^-' ■ log^^^(l/e)). This completes 
the proof of Theorem 1.5. ■ 
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4.1 Proof of Claim 4.1 



Let p : M" — *• M be a degree-d polynomial over independent standard Gaussian random 
variables. Let us assume that \\p\\2 = 1 and that e < 2~'^'^~^. We will show that 

\\qh = 0{dV~e). 



It will be convenient for the proof to express p in an appropriate orthonormal basis. Let 
Pi^) — Yls£sPi^)-^si-'^) Hermite expansion; 5 is a family of multi-indices where each 

{Hs}s^s has degree at most d. By orthonormality of the basis we have that 

lbll2 = E^(^)'- 

Note that q{x, z) = '^s^sPi'^^i^si^) ~ -^s(y)) and 

q\x,z) =J2fiS){Hsix)-Hs{y)Y+ Yl PiS)piT){Hs{x)-Hs{y)){HT{x)-HTiy))). 

Let us denote the second summand in the above expression by q'{x,z). We will first show 
that 

E,,,[g'(x,2)] =0. 
By linearity of expectation we can write 



E,,,[g'(x, z)] = P(S)P(T) E.,. [{Hsix) - Hs{y)){HT{x) - Hriy))) 

S,TeS,Sy^T 



0. 



Hence, it suffices to show that for all S* 7^ T we have 



(Hsix) - Hs{y)){HT{x) - Hriy))) 



0. 



By orthogonality of the Hermite basis, and the fact that y is distributed according to A/"", 
the above expression equals 



-E. 



Hs{x)HT{y) 



E, 



Hs{y)HT{x) 



Thus, the desired result follows from the following lemma: 



Lemma 4.2 For all S it holds 

E,,, [Hs{x)HT{y)\ = 0. 
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Proof: Since S ^ T, it suffices to prove the result for univariate Hermite polynomials. The 
result for the multivariate case then follows by independence. That is, for Xi, zi G A/'(0, 1) 
and s t E [d], we need to show that 

Ea;i,2i [hs{xi)ht{axi + pzi)] =0. 

Since + = 1, we have that the joint distribution of {xi,axi + (3zi) is identical to 
the joint distribution of [axi + l3zi,xi), and thus we can assume wlog that s > t. Since 
ht{axi + Pzi) is a degree-t polynomial in xi, z\ it can be written in the form 

t 

^ Cijhi{xi)hj{zi) 

i,j=0 

for some real coefficients Qj. Hence, by linearity of expectation and independence, the desired 
expectation is 

i 

CijE[hi{xi)hs{xi)] ■ E[hj{zi)] 

i,j=0 

which equals by orthogonality of the Hermite basis. ■ 
At this point, we need the following claim whose proof is deferred to the following subsection: 

Claim 4.3 Let ifd(x) be a degree-d multivariate Hermite polynomial. Then 

\\H,{x)~H,{y)y = Oid-V~e). 

Repeated applications of Claim 4.3 now yield 

< Y.f{S)-0{d'-e)=0{d'-e) 
ses 

concluding the proof. 

4.1.1 Proof of Claim 4.3 

We can assume wlog that 

j 

i=l 

where j E [d], ki > 1, and J2i=i — ^■ 
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For i e [j], we denote by Ahk^{xi,yi) = h^XVi) " ^kX^i). Then we can write 

i 

Hd{y) = n ^kXVi) = n i^kX^i) + ^hkXxi, Vi)) 

i=l 

We will need the following claim whose proof lies in the next subsection: 

Claim 4.4 Let hd{x) he a degree-d univariate Hermite polynomial. Then 

||A/id(x,y)||2 = \\hd{x) - ha{y)h < 8v^- v^. 

The triangle inequality for norms combined with independence now yields 

Noting that ||/ifci(xj)||2 = 1 for all i, and ||A/ifc. (xi,i/i)||2 < 8v^- by Claim 4.4 above, we 
obtain 



j 

= Y,\\/\hkXXi.yi)\\2+ ll\\^hkXx^,y^)h 

i=l lQ[j],\l\>2 iel 

< 8(^v^)-v^+X:(|^l)(8v^v^f' 

i=l \I\=2 ^' '-^ 

where the last inequality follows from the elementary bound (1 + 8^/dy/eY < 1 + 8d^^'^^/e + 
0{d^e) and the fact that e < 2"^*^^^. This completes the proof of Claim 4.3. 



4.1.2 Proof of Claim 4.4 

We will need a crucial lemma: 

Lemma 4.5 For all k G [d] we have 

\\hk{x) - hk{x - ex)\\2 < 3ke. 
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Proof: Note that hk{x — ex) is a degree-fc polynomial in x. Hence, by Taylor's theorem we 
deduce 



hk{x) - hk{x - ea;) = - ^ h^^\x){-exy 



i=l 



The triangle inequality for norms now yields 

k 



\\h,{x) - hk{x - ex)h < 5^(e7^!) " \\hi\x)x%. 



2=1 



It thus suffices to bound the term ||/;,^ (x)x*||2. Recalling that {hk\x))'^ = i\(^.^{hk-i{x)y 
we have 

E4{hl\x)rx'^] = z\(^YE4hU{x)x' 



„2jl 



For i = 1, using the well-known relation 

Vkhk{x) + \Jk — lhk-2{x) = xhk_i{x) 
and the orthonormality of the hi's, an easy calculation gives Ea;[/i^_^(x)a;^] = 2k — 1; hence, 

\\K{x)xh < V2k. 
For i > 1, by Cauchy-Schwartz we get 



E4hl_,{x)x''] < JE4hU{x)]-E4x^^ 



We now proceed to bound the RHS. By hypercontractivity, the ffist term can be bounded 
as follows 

ll^fc-illl ^ *ll^fc-i|l2 = 

For the second term we recall that, for x ~ A/", we have E2;[x^*] = Using the 

elementary inequality (2j)!/j! < 2'^^j\ we conclude 

E4hl_i{x)x^'] < 3^-' ■ T^/i2iy. < 3'' ■ (4/3)* ■ i\ < AH\ 

hence, 

\\hf{x)x^\,<.[{^2H\<2''l'.^\. 



Therefore, 

fc-i 



\\hk{x) -hk{x- ex)\\2 <V2k-e + e- 2^'''^ -"^e^ <3k-e 



where we used the fact e < 2~2d < 2^2^ ^^ich yields Ej^i e^' < Ej°li 2"^^"^' < 2"2fe+i_ xhe 
proof of the lemma is now complete. ■ 
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We now proceed to complete the proof of our claim. Let us write 

Ahd{x, y) = hd{x) - hd{y) = qi{x) + q2{x, z) 

where qi{x) = h^i^x) — hd{x — ex) and q2{x, z) = hd{x — ex) — hfi{x — ex + (3 z) . 
By the triangle inequality for norms it holds that 

\\Ahd{x,y)\\2 < ||gi||2 + ||g2||2 

hence it suffices to bound each of the terms in the RHS. 
By Lemma 4.5 it follows that 

Ikilb < 3(ie. 

For the second term, we will show that 

||?2||2 < sVrf ■ i/e. 

Note that this suffices to complete the proof, since by our assumption on e, we have d-e < 1, 
which implies that 

\\Ahd{x,y)\\2 < sVrfv^ 

as desired. 

Now observe that hd{x — ex+ [3z) is a degree-d polynomial in x, z. Let us denote x' = {l — e)x. 
By Taylor's theorem we can write 



hd{x' + (3z) = hdix') + Y,{P^/mf{x')z' 



i=l 

or 



q2{x,z) = -Y^{(3^/^)hf{x')^ 



1=1 

By triangle inequality 



\q2h<Y.^P'/mhf{x')z% 



i=l 



For the terms in the RHS by independence we get 

v^{^)A2 = v^{^)h-m2 

For the second term above we have that ||-z*||2 < 2*/^ ■ \fi\. 
Recalling that h^^\x')'^ = 'i^-{f)hd^i{x') for the first term we have 



\hfix')\\2 = V^^.J('^)■\\hd-.ix')\\2. 



Since x' = x — ex we apply Lemma 4.5 ioi k = d — i and get 

\\hd-i{x')\\2 < \\hd-i{x)h + 3(rf - z)e < 2 
where the second inequality uses the assumption on the range of e. 
Therefore, 




This completes the proof of Claim 4.4. 

5 Boolean Average Sensitivity 

Let AS(n, (i) denote the maximum possible average sensitivity of any degree-ci PTF over n 
Boolean variables. In this section we prove the claimed bound in Theorem 1.1: 

AS(n, d) < 2^('^) ■ logn ■ ni-i/(4'i+2) (2) 

For d = 1 (linear threshold functions) it is well known that AS(n, 1) = 2~"(,^"2) — ©(v^)- 
Also, notice that the RHS of (2) is larger than n for d = uj{y/logn), yielding a trivial bound 
of AS(n,d) < n. Therefore throughout this section we shall assume d satisfies 2 < d < 
O(Vlogn). 

5.1 Overview of proof 

The high-level approach to proving Theorem 1.1 is a combination of a case analysis and a 
recursive bound. 

For certain types of PTFs ("r-regular" PTFs; see Section 5.2 for a precise definition) we 
argue directly that the average sensitivity is small, using arguments similar to the Gaussian 
case together with the invariance principle. In particular, we show: 

Claim 5.1 Suppose f = sign(p) is a r-regular degree-d PTF where r =^ 77,-(4'^+i)/(4'^+2) ^ 
Then, 

AS(/) < 0(c/-rii-^/(^'^+2)) 
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Claim 5.1 follows directly from Lemma 5.8, which we prove in Section 5.4. 

For PTFs that are not r-regular, we show that there is a not-too-large value of k (at most 

K =^ 2d\ogn/T), and a collection of k variables (the variables whose influence in p are 
largest), such that the following holds: if we consider all 2^ subfunctions of / obtained by 
fixing the variables in all possible ways, a "large" (at least 1/2^^'^^) fract ion of the restricted 
functions have low average sensitivity. More precisely, we show: 

Claim 5.2 Let K 2d\ogn/T where r =-'^72-(4'i+i)/(4a!+2) ^ Suppose f = sign(p) is a degree-d 
PTF that is not r-regular. Then for some 1 < k < K, there is a set of k variables with the 
following property: for at least a 1/2^^^^ fraction of all 2^ assignments p to those k variables, 
we have 

AS(/p) < 0{d ■ (logn)i/^ ■ 

The proof of Claim 5.2 is given in Section 5.7. We do this by generalizing the "critical index" 
case analysis from [Ser07]. We define a notion of the r-critical index of a degree-c? polynomial; 
a r-regular polynomial p is one for which the r-critical index is 0. If the r-critical index of p 
is some value k < 2d\ogn/T, we restrict the k largest- influence variables (see Section 5.5). If 
the r-critical index is larger than 2dlogn/T, we restrict the k = 2d\ogn/T largest- influence 
variables in p (see Section 5.6). 



5.1.1 Proof of main result (Theorem 1.1) assuming Claim 5.1 and Claim 5.2 

Given these two claims it is not difficult to obtain the final result. In Claim 5.2, we note 
that the k restricted variables may each contribute at most 1 to the average sensitivity of / 
(recall that average sensitivity is equal to the sum of infiuences of each variable), and that the 
total infiuence of the remaining variables on / is equal to the expected average sensitivity 
of fp, where the expectation is taken over all 2^ restrictions p. Since each function fp is 
itself a degree-c? PTF over at most n variables, we have the following recursive constraint on 
AS{n,d): 



AS(n, d) < max{ 0{d ■ n 



l-l/(4d+2)^ 



max {k + a-0{d- {lognf^ ■ ^1-^(4-^+2)) + _ c^)AS(n, d)}}. 

1 /90{d)<„<l 



It is easy to see that the maximum possible value of AS(?T.,(i) subject to the above con- 
straint is at most the maximum possible value of AS'(n, d) that satisfies the following weaker 
constraint: 

AS'(ri, d)<K+(^- AS'(n, d) 

which is satisfied by AS'(n, d) < 2°W ■ logn ■ 
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5.2 Regularity and the critical index of polynomials 



In [Ser07] a notion of the "critical index" of a linear form was defined and subsequently 
used in [OS08, DS09, DGJ"''09]. We now give a generalization of the critical index notion for 
polynomials. 

Definition 2 Let j> : { — 1, 1}" ^ M and r > 0. Assume the variables are ordered such that 
Infj(/) > Infi_|_i(/) for all i E [n — 1\. The r-critical index of f is the least i such that: 



E"=i+ilnfj(p) 

// (3) does not hold for any i we say that the r-critical index ofp is +oo. Ifp is has r-critical 
index 0, we say that p is r-regular. 

The following simple lemma will be useful for us. It says that the total influence Y27=j+i ^^Uip) 
goes down exponentially as a function of j prior to the critical index: 

Lemma 5.3 Let p : { — 1,1}" M and r > 0. Let k be the r-critical index of p. For 
< j < k we have 

n 

Inf.(p) < (l-r^ -Infb). 

i=j+i 

Proof: The lemma trivially holds for j = 0. In general, since j is at most k, we have that 

n 

Inijip)>r-J2lnU{p), 

i=j 

or equivalently 

n n 

Inf.(p)<(l-r)-5];inf,(p) 

i=j+l i=j 

which yields the claimed bound. ■ 

Let p : { — 1, 1}'" ^ M be a degree-d polynomial. We note here that the total influence of p 
is within a factor of d of the sum of squares of the non-constant coefficients of p: 

n n 
Sy^9) i=l SBi i=l SC[n] Sj^9 

where the final inequality holds since p{S) ^ only for sets 15*1 < d. 
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5.3 Restrictions and the influences of variables in polynomials 

Let — l,l}"^Mbea degree-ci polynomial. The goal of this section is to understand 
what happens to the influences of a variable x^, i > k, when we do a random restriction to 
variables Xi, . . . ,Xk- 

We start with the following elementary claim: 

Claim 5.4 Let p be a randomly chosen assignment to the variables Xi,...,Xk- Fix any 
S C {k + 1, . . . ,n}. Then for any polynomial p : { — 1, 1}" ^ ^ we have 



TC[k] 

and so we have 

KlfpiSf] = E U Tf. (4) 

TC[k] 



In words, all the Fourier weight on sets of the form S U{some restricted variables} "collapses" 
down onto S in expectation. A corollary of this is that in expectation, the influence of an 
unrestricted variable Xi does not change when we do a restriction: 



Corollary 5.5 Let p be a randomly chosen assignment to the variables xi, . . . ,Xk- Fix any 
I E {A; + 1, . . . , n}. Then for any polynomial p : { — 1, 1}" ^ M we have 

Ep[Inf,(pp)] = Inf,(p). 



Proof: 



E,[Inf,(p,)] = E, 



^eS'C{fc+l,...,n} 
TC[fc] £e5C{fe+l, .•■,"} 



U5£ 



5.3.1 Influences of low-degree polynomials behave nicely under restrictions 

In this subsection we prove the following lemma: For a low-degree polynomial, a random 
restriction with very high probability does not cause any variable's influence to increase by 
more than a polylog(?7,) factor. 
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Lemma 5.6 Let p{xi, . . . be a degree-d polynomial. Let p be a randomly chosen assign- 
ment to the variables Xi, . . . ,Xk- Fix any t > e^'^ and any £ G [A; + 1, n]. With probability at 
least 1 — exp(— over the choice of p, we have 

Inf^Pp) <t-3%f,(p). 

In particular, for t = log'^n, we have that with probability at least 1 — n~^^^\ every variable 
£ e [/c + l,n] has lni£{pp) < (Slogn)'^ ■ Inf£(p). 

Proof: Since Inf^(pp) is a degree-2(i polynomial in p, Lemma 5.6 follows as an immedi- 
ate consequence of Theorem 2.2 if we can upper bound ||Inf£(pp)||2. We use the bound in 
Lemma 5.7, stated and proven below. ■ 

Lemma 5.7 Let p{xi, . . . ,Xn) be a degree-d polynomial. Let p be a randomly chosen as- 
signment to the variables Xi,...,Xk, and let i E [k -\- l,n]. Then Inf^(pp) is a degree-2d 
polynomial in variables pi, ... , p^, and 

||Inf,(Pp)||2<3"-Inf,(p). 

Proof: The triangle inequality tells us that we may bound the 2-norm of each squared- 
coefficient separately: 

||Inf,(p,)||2< Yl \\MS)%. 

£G5C[fc+l,n] 

Since Pp{S) is a degree-rf polynomial, Bonami-Beckner (i.e., (4, 2)-hypercontractivity) tells 
us that 

ms)% = msm < ^'ms)\\i, 

hence 

||Inf,(pp)||2 < S'^ Yl \\MS)\\l = 3' ■ Inhip) 

e&SC[k+l,n] 

where the last equality is by Corollary 5.5. ■ 
5.4 The regular case 

In this section we prove that regular degree-d PTF's have low average sensitivity. In partic- 
ular, we show: 

Lemma 5.8 Fix r = n^®*-^-*. Let f be a T-regular degree-d PTF. Then, 

AS(/) < 0(rf-n-r^/(^''+^)) 
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Claim 5.1 follows directly from the above lemma, recalling we choose r = 77,-{4'i+i)/{4<^+2)_ 
However, the lemma will also be useful in the "small critical index" case for a slightly larger 
regularity parameter r. 

Proof: Let / : { — l,l}"^]Rbea degree-d PTF, i.e. / = sign(p) where p is r-regular. We 
may assume that p is normalized such that Ylo<\s\<dPi^)'^ ~ ^■ 

First we note that flipping the i-th bit of an input x G { — 1, 1}" changes the value of p by 
the magnitude of its partial derivative with respect to i: 

2D,p{x) = 2Y,P{S)xs^{i} 

It follows that: 

Inf,(/) < Pr,.e|_i,i}„[|p(x)| < |2Ap(x)|] 

Therefore, bounding from above the influence of variable i in f can be done by showing the 
following: 

1. p{x) has small magnitude, \p{x)\ < t for some threshold t, with small probability. 

2. 2Dip{x) has large magnitude, \2Dip{x)\ > t, with small probability. 

We bound the probability of the first event using the anti-concentration property of regular 
low-degree polynomials, as implied by the invariance principle along with Theorem 2.3. For 
the second event we use the tail bound for degree-d polynomials (Theorem 2.2). 

We will take our threshold t to be t =^ t^^^, where r is the regularity parameter of p. 
5.4.1 Bounding the probability of the first event 

By the r-regularity of p, for all i G [n] we have Infj(p) < r ■ Inf(p) < d ■ t where the last 
inequality follows by the assumed normalization. With this bound, the invariance principle 
(Theorem 2.5) tells us that Pr^g{„i,i}n[|p(x)| < r^/^] differs from Prg^,...,gJ\piG)\ < r^/^] 
by at most 0{d ■ (dTY^^^^~^^^) = 0{d ■ r^/^'^'^+'^)^. Applying the anti-concentration bound of 
Carbery and Wright for polynomials in Gaussian random variables (Theorem 2.3), we get: 

Pr.[b(a:)| < r'^''] < Prc;,...,g,Jb((?)| < r'/'] + Oidr'/^'^-^'^) 
< 0(rf-ri/^'^) + 0(d-r^/(^'^+^)) 
= 0(d-ri/(^^+i)). 
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5.4.2 Bounding the probability of the second event 

Next we consider Pr^[\2Dip{x)\ > r^/^]. Note that 2D iP is a degree- (d— 1) polynomial whose 
I2 norm is small: 

||2Ap|| = 2 lj2piSy = 2Vlnfi(p) < 2Vd^. 
By (Theorem 2.2), we get that 

Pr.[|2Ap(x)| > t'/'] < Pr.[|2Ap(x)| > t-'/'/{2vQ) ■ ||2Ap||] 

< exp(-r-i/(2d) /^2Vdf'') = exp(-e(l) ■ t-'/^^''^) « 0(rf ■ r^/^^'^+i); 

(In the second inequality, we were able to apply the concentration bound since, by our 
assumptions on d and r, we indeed have that r"^/V(2v^) > e'^.) 

Hence, we have shown that: 

Inf,(/) < Pr,g|_i,i|n[|p(a:)| < |2Ap(a:)|] 

< Pr.[b(x)| < + Pv.Amp{x)\ > rV4] 
= 0(d-r^/(^'^+i)). 

Since this holds for all indices z G [n], we have the following bound on the average sensitivity 
of / = sign(p): 

AS(/) <0(d-n-r^/(''^+^)). 



5.5 The small critical index case 

Let / = sign(p) be such that the r-critical index of p is some value k between 1 and 
K = 2d\ogn/T. By definition, the sequence of influences Inffc+i(p), . . . , Inf„(p) is r-regular. 
We essentially reduce this case to the regular case for a regularity parameter r' somewhat 
larger than r. 

Consider a random restriction p of all the variables up to the critical index. We will show 
the following: 

Lemma 5.9 For a 1/2^^'^^ fraction of restrictions p, the sequence of influences Inffc_|_i(pp), 
. . . , Inf„,(pp) is t' -regular, where r' = (3 logn)"^ ■ r. 

By our choice of r = 72-(4'^+i)/(4<i+2)^ have that r' = n~^^^\ and so we may apply 
Lemma 5.8 to these restrictions to conclude that the associated PTFs have average sen- 
sitivity at most 0{d-n- (r')i/(4d+i))_ 



25 



Proof: 

Since the sequence of influences Inffc+i(p), . . . , Inf„(p) is r-regular, we have 

E"=fc+ilnf,(p) - 

for alH G [A; + 1 , n] . 

We want to prove that for a 1/2'^^'^'^ fraction of all 2^ restrictions p to xi, . . . , Xfc we have 



Ej=fc+ilnfjfe 
for alH G [A; + 1, n]. 

To do this we proceed as follows: Lemma 5.6 implies that, with very high probability over 
the random restrictions, we have Inf < (3 logn)'^ ■ Inf for all z G [/c + 1, n]. We need 
to show that for a 1/2'^'^'^^ fraction of all restrictions the denominator of the fraction above 
is at least ^"=^+1 hiijij)) (its expected value). The lemma then follows by a union bound. 

We consider the degree-2(i polynomial A(pi, . . . , pk) == Yl^j=k+i ^"^^iiVp) variables pi, . . . , pk. 
The expected value of A is 'Eip[A] = Yl]j=k+i^^^jiP) ~ ^i^)- ^PPly the Theorem 2.4 for 
B = A- 1(0). We thus get Prp[B > 0] > l/2^('^). We thus get Pvp[A > Ep[A]] > l/2^('^) 
and we are done. ■ 



5.6 The large critical index case 

Finally we consider PTFs / = sign(p) with r-critical index greater than K = 2d\ogn/T. Let 
p be a restriction of the first K variables 7i = {1, . . . , i^}; we call these the "head" variables. 
We will show the following: 

Lemma 5.10 For a 1/2'^^'') fraction of restrictions p, the function sign(pp(a;)) is a constant 
function. 

Proof: By Lemma 5.3, the surviving variables xk+i-, ■ ■ ■ ,Xn have very small total influence 
in p: 

n n 

J2 inf.(p) = Yl ^ (1 - ^)'' ■ ^ ^/^'' (5) 

i=K+l i=K+l S^i 

Therefore, if we let p' be the truncation of p comprising only the monomials with all variables 
in 7i, 

p{xi,...,Xk) = Yp{S)xs 
Sen 
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we know that almost all of the original Fourier weight of p is on the coefficients of p': 

n 

1 > p{Sf > 1 - 5^ Infi(p) > 1 - d/n^'' 
Sen i=K+i 
\s\>o 

We now apply Theorem 2.4 to p' ^ and get: 

Pr,.e{_i,i}K[b'(x)|>l/2«('^)]>l/2«('^). 

In words, for a 1/2*^^'^^ fraction of all restrictions p to xi, . . . , xk, the value p'{p) has magni- 
tude at least l/2^('^). 

For any such restriction, if the function fp{x) = sign{pp{xK+i, ■ ■ ■ ,Xn)) is not a constant 
function it must necessarily be the case that: 

0<\S\C{xK + l,-,Xn} 

As noted in (5), each tail variable i > K has very small influence in p: 

n 

Inf,(p) < ^^^iiP) = d/^'" 

i=K+l 

Applying Lemma 5.6, we get that for the overwhelming majority of the 1/2*^'^'^^ fraction of 
restrictions mentioned above, the influence of i in pp is not much larger than the influence 
of i in p: 

Inf^(Pp) < (31ogn)'^ ■ Inf^(p) < d ■ (31ogn)7n^'^ (6) 
Using Cauchy-Schwarz, we have 



S3e,sc{xK+i,-,x„} 



< 

where we have used (6) (and our upper bound on d). From this we easily get that 

^ after a very slight rescaling so the non-constant Fourier coefficients of p' have sum of squares equal to 1; 
this docs not affect the bound wc get because of the big-0. 




27 



0<\S\C{xK+i,-,Xu} 

We have established that for a 1/2^^'^^ fraction of all restrictions to Xi, . . . ,xk, the function 
fp = sign(pp) is a constant function, and the lemma is proved. ■ 

5.7 Proof of Claim 5.2 

If / is a degree-d PTF that is not r-regular, then its r-critical index is either in the range 
{1, . . . , K} or it is greater than K. 

In the first case (small critical index case), as shown in Section 5.5, we have that for a 1/2'^^^^ 
fraction of restrictions p to variables Xi, . . . ,Xk, the total influence of fp = sign(pp) is at most 

0{d-n- (r')^/(^'^+^)) = Oid ■ (logn)i/^ ■ n'-'/^^''+^^), 

so the conclusion of Claim 5.2 holds in this case. 

In the second case (large critical index case), as shown in Section 5.6, for a 1/2'^^'^^ fraction 
of restrictions p to Xi, . . . ,xk the function fp is constant and hence has zero influence, so 
the conclusion of Claim 5.2 certainly holds in this well. ■ 

6 A Fourier- Analytic Bound on Boolean Average Sen- 
sitivity 

In this section, we present a simple proof of the following upper bound on the average 
sensitivity of a degree-ci PTF (Theorem 1.2): 

AS(n,rf) < 2r^l-l/2^ 

We recall here the definition of the formal derivative of a function / : {—1, 1}" M. 

DiPix) = Ypsxs-{i}- 

It is easy to see that, 

Ap(x) = IxMx) - Pix^')] = I (^EM^^^ (7) 
where "x®*" means "x with the i-th bit flipped." 

For a Boolean function /, we have Dif{x) = ±1 iff flipping the ith bit flips /; otherwise 
Dif{x) = 0. So we have 

Inf,(/)=E[|A/(x)|]. 
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Lemma 6.1 Fixi^j^ [n]. Let f, g :{— 1,1}"" be functions such that f is independent 
of the i^^ bit Xi and g is independent of the j*'' bit Xj. Then 



E^[xiXjf{x)g{x)] < 



Inf,((7) + Inf,(/) 



Proof: First, note that the influence of i^^ coordinate on a function / can be written as: 



Inf,(/) = E._jVar.J/(x)]] = E,. 



\f{x^')-f{x)\ 



E._J|E.Ja;,/(x)]|^] (8) 



As / is independent of Xi and g is independent of Xj, we can write, 

E4xiXjf{x)g{x)] = E3;_(^_^j E^.,,^. [xiXjf{x)g{x)] 
= [E,Jx,^(x)] E,Jx,-/(x)]] 



< E, 



< 



Inf,(/) + Inf,,(^?) 



^|E.Jx,(7(x)]|2 + ^|E.Jx,/(x)]|2 



[using ah < |(a^ + 6^)) 
(using Equation 8) 



Theorem 1.2 is shown using an inductive argument over the degree d. Central to this induc- 
tive argument is the following lemma relating the influences of a degree-rf PTF sign(p(x)) to 
the degree-((i — 1) PTEs obtained by taking formal derivatives of p. 

Lemma 6.2 For a PTF f = sign(p(x)) onn variables andi G [n], Infj(/) = E[/(x)xjSign(Djp( 

The following simple claim will be useful in the proof of the above lemma. 

Claim 6.3 For two real numbers a, h, if sign(a) ^ sign(6) then 

sign(sign(a) — sign(6)) = sign(a — h) 

Proof: If sign(a) = 1 and sign(6) = — 1 (a > 0, & < 0) then a — 6 > 0. Hence in this case, 
sign(a — 6) = 1 = sign(l — (— 1)) = sign(sign(a) — sign(6)). On the other hand, if sign(a) = —1 
and sign(6) = 1, then sign(a — 6) = — 1 = sign((— 1) — 1) = sign(sign(a) — sign(6)). ■ 

Proof: [of Lemma 6.2] The influence of the i^^ coordinate is given by. 



Infi(/) = E 



E 



(/(x) - sign (/(x) - 



(9) 
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Consider an x for which f{x) ^ /(x®*). In this case, we can use Claim 6.3 to conclude: 

sign (/(x) — /(x®*)) = sign — p(x®*)) , 

= sign{2xiDip{x)) = XiSign{Dip{x)) . (using (7)) 

Hence for an x with f{x) 7^ /(x®*), 

(fix) - /(x®0) sign (fix) - /(x®^)) = (fix) - /(x®0) x,sign(Ap(a:)) . 

On the other hand, if f{x) = /(x®*) then the above equation continues holds since both the 
sides evaluate to 0. Substituting this equality into Equation 9 yields, 

Inf,(/) = ^E[/(x)x,sign(Ap(x))] - ^ E [/(x®>,sign(Ap(x))] . 

Notice that the z*^ coordinate (x®*)j of x®* is given by — Xj. Since Dip is independent of the 
i^^ coordinate Xj, we have Dip{x) = Dip{x®^). Rewriting the above equation, we get 

Inf,(/) = ^ E [/(x)x,sign(Ap(x))] + ^ E [/(x®0(x®0,sign(Ap(x®0)] , 
= E [/(x)xjSign(Ap(2;))] ((x®*) is also uniformly distributed) 



Theorem 6.4 Let AS{n,d) denote the max possible average sensitivity of any degree-d PTF 
on n variables. Then we have 

AS{n,d) < ^/n + n■ AS{n,d-l). 
Inf(/) = J]lnf,(/) 

i 

= E[/(x)xjSign(Ap(2^))] (by Lemma 6.2) 

i 

= E[/(x) ^XiSign(Ap(a;))] 



Proof: 



< VE[/(x)2] . /E[(5^x,sign(Ap(x)))2] (10) 



1 ■ E[^XiXjSign{Dip{x))sign{Djp{x))] (11) 



< 



lElY^ x2sign(Ap(x))2] + lnU{sign{Djp{x))) (12) 



n 



^Inf,(sign(D,p(x))). (13) 
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Here (10) is the Cauchy-Schwarz inequality, (11) is expanding the square. Step (12) uses 
Lemma 6.1 which we may apply since Dip{x) does not depend on x,. 

Observe that for any fixed /, we have Dj'p{x) is a degree-((i— 1) polynomial and sign[Djip{x)) 
is a degree-((i — 1) PTF. Hence, by definition we have, 

J2 lni{sign{Dj>p{x))) < AS(n, d - 1) , 

for all j' G [n]. Therefore the quantity ^j^^ Inf (sign(Djp(x))) < n ■ AS(n, d — 1), finishing 
the proof. ■ 

The bound on average sensitivity (Theorem 1.2) follows immediately from the above recur- 
sive relation. 

Proof:[of Theorem 1.2] Clearly, we have AS(n, 0) = 0. For d = 1, Theorem 6.4 yields 
AS(n, 1) < y/n. Now suppose AS(n, d) = 2n^~^/^ for d > 1, then by Theorem 6.4, 

AS(n, d+l)< ^n + n-AS{n,d) < ^ ^n^-^l'^" = In^"^!'^"^^ , 
finishing the proof. ■ 



7 Boolean average sensitivity vs noise sensitivity 

Our results on Boolean noise sensitivity are obtained via the following simple reduction which 
translates any upper bound on average sensitivity for degree-rf PTEs over Boolean variables 
into a corresponding upper bound on noise sensitivity. This theorem is inspired by the proof 
of noise sensitivity of halfspaces by Peres [Per04] . 

Theorem 7.1 Lei NS(e, c?) denote the maximum noise sensitivity of a degree d-PTF at a 
noise rate of e. For a// < e < 1 if m = [^J then, 

NS(e,rf) < —AS(m,d). 
m 

Theorem 1.3 follows immediately from this reduction along with our bounds on Boolean 
average sensitivity (Theorems 1.1 and 1.2), so it remains for us to prove Theorem 7.1. 



7.1 Proof of Theorem 7.1 

Let /(x) = sign(p(x)) be a degee ci-PTF. Let us denote 5 = As 5 > e, by the monotonicity 
of noise sensitivity we have NSe(/) < NS5(/). In the following, we will show that NS5(/) < 
^AS(m, (i) which implies the intended result. Recall that NS5(/) is defined as 

ms{f) = Pr^^.y [fix) + f{y)\ , 
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where x U denotes that y is generated by flipping each bit of x independently with 
probability 5. An alternate way to generate y from x is as follows: 



- Sample r G {1, . . . , m} uniformly at random. 

- Partition the bits of x into = | sets Si, S2, ■ ■ ■ , Sm by independently assigning each 
bit to a uniformly random set. Formally, a partition a is specified by a function 
a : — {1,..., m} mapping bit locations to their partition numbers, i.e., i G 
Sa(i)- A uniformly random partition is picked by sampling a{i) for each i G {1, . . . , n} 
uniformly at random from {1, . . . , m}. 

- Flip the bits of x contained in the set Sr to obtain y. 

Each bit of x belongs to the set Sr independently with probability — = 6. Therefore, the 
vector y generated by the above procedure can equivalently be generated by flipping each 
bit of X with probability 6. 

Inspired by the above procedure, we now define an alternate equivalent procedure to generate 
the pair x ~5 y. 

- Sample a G { — 1, 1}" uniformly at random. 

- Sample a uniformly random partition a : {l,...,n} — > {1,..., m} of the bits of a. 

- Sample 2; G { — 1, 1}™ uniformly at random. 

- Sample r G {1, . . . , m} uniformly at random. Let z = z®^ and 



Xi 



aiZ 



iZa{i) 



aiZ 



iZa{i) 



Notice that x is uniformly distributed in {— l,!}", since both a and z are uniformly dis- 
tributed in { — 1,1}" and {—1,1}™ respectively. Furthermore, Zi = Zi for all i ^ r and 
Zr = —Zr. Therefore, y is obtained by flipping the bits of x in the coordinates belonging 
to the r*^ partition. As the partition a is generated uniformly at random, this amounts to 
flipping each bit of x with probability exactly ^ = S. 

The noise sensitivity of / can be rewritten as, 

NS^(/) = Pr,,„,,,, [fix) ^ f{y)] 

For a fixed choice of a and a, /(x) is a function of z. In this light, let us define the function 
fa,a '■ {—1, 1}™' {~1) 1} each a, a as fa,a{z) = f{x). Returning to the expression for 
noise sensitivity we get: 



E„ 



-. m 

m ^-^ 

r=l 
^ m 

-EE. [1 [faA^) faA^""!]] 



r=l 



32 



In the above calculation, the notation 1[E] refers to the indicator function of the event E. 
Recall that, by definition of influences, 

Inf,,(/„,„) = E, [1 [fa,^iz) ^ faA^n]] , 
for all r. Thus, we can rewrite the noise sensitivity of / as 



NS,(/) = E,,, 



^ m 

— S2lnir{fa,c 



r=\ 



-E,,, [Inf(/,„ 
m 



(14) 



We claim that ]a.,a is a degree ci-PTF in m variables. To see this observe that 

/a,a(2;) = Sign(p(xi, . . . = slgU {pifl\^ (x{X) ■, ■ ■ • , An^^aH)) , 

which for a fixed choice of a, a is a degree li-PTF in z. Consequently, by definition of 
AS(m, d) we have Inf < AS(m, d) for all a and a. Using this in (14), the result follows. 



8 Application to Agnostic Learning 

In this section, we outline the applications of the noise sensitivity bounds presented in this 
work to agnostic learning of PTFs. Specifically, we will present the proofs of Theorem 1.6 
and Theorem 1.7. To begin with, we recall the main theorem of [KKMS08] about the L\ 
polynomial regression algorithm: 

Theorem 8.1 Let V be a distribution over X x { — 1,1} (where X C which has marginal 
Vx over X. Let C be a class of Boolean-valued functions over X such that for every f E C, 
there is a degree-d polynomial p{xi, . . . ,a;„) such that E^.^x'a-[(p(^)~/(^))^] ^ Then given 
independent draws fromV , the Li polynomial regression algorithm runs in time polyi^n"^, 1/e, log(l/5)) 
and with probability 1 — 6 outputs a hypothesis h : X x { — 1, 1} such that Pr(^r^^y^^-j)[h{x) ^ 
y] < opt + e, where opt = min/ec Pr{x,y)^v[f{x) 7^ y]. 

We first consider the case where Vx is the uniform distribution over the n-dimensional 
Boolean hypercube {—1, 1}*^. Klivans et al. [KOS04] observed that Boolean noise sensitivity 
bounds are easily shown to imply the existence of low-degree polynomial approximators in 
the L2 norm under the uniform distribution on { — 1, 1}": 

Fact 8.2 For any Boolean function f : { — 1,1}" { — 1,1} and any value < 7 < 
1/2, there is a polynomial p{x) of degree at most d = I/7 such that E[(p(x) — f{x))^] < 
Tz!^NS,(/). 

Theorem 1.6 follows directly from Theorem 8.1, Fact 8.2 and Theorem 1.3. 

Next we turn to the case where Vx is the 7V(0, J„) distribution over M". In [KOS08] observed 
that using entirely similar arguments to the Boolean case, Gaussian noise sensitivity bounds 
imply the existence of low-degree polynomial approximators in the L2 norm: 
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Fact 8.3 For any Boolean function f : {— l,!}" — > {—1,1} and any value < 7 < 1/2, 
there is a polynomial p{x) of degree at most c? = I/7 such that Eg^_/v'(o/ )[{p{G) — /(^))^] < 
^GNS,(/). 

For the special case of learning under the standard multivariate Gaussian A/"", Theorem 1.7 
follows directly from Theorem 8.1, Fact 8.3 and Theorem 1.5. Since our results hold for all 
degree-d PTFs, the extension to arbitrary Gaussian distributions follows exactly as described 
in Appendix C of [KOS08]. 



9 Discussion 

An obvious question left open by this work is to actually resolve the Gotsman-Linial con- 
jecture and show that every degree-c? PTF over {—1, 1}"" has average sensitivity at most 
0{d^/n). [GS09] show that this would have interesting implications in computational learn- 
ing theory beyond the obvious strengthenings of the agnostic learning results presented in 
this paper. 

In this section we observe (Proposition 9.1) that this conjecture is in fact equivalent to a 
strong upper bound on the Boolean noise sensitivity of degiee-d PTFs. We further point 
out (Proposition 9.2) that Gaussian noise sensitivity of degree-d PTFs is upper bounded 
by Boolean noise sensitivity. Thus, we propose working on improved upper bounds for the 
Gaussian noise sensitivity of degree-rf PTFs as a preliminary - in fact, necessary - step to 
settling the Gotsman-Linial conjecture. 

Proposition 9.1 The following two statements are equivalent: 

1. Every degree-d PTF over { — 1,1}" has AS(/) < 0{dy^). 

2. Every degree-d PTF over {-1, 1}" has NS,(/) < 0{d^) for all e. 

Proof: 

1) ^ 2): This follows immediately from Theorem 7.1 

2) ^ 1): Let / = sign(p) be a degree-c? PTF. We have 
NSi/„(/) = Pr,,,[/(x) ^ /(y)] 

n 

= Pr^^y[/(x) 7^ f{y) I y flips k of x's bits] ■ Pr^^yly flips k of x's bits] 

> Pr^^j^[/(x) 7^ f{y) \ y flips 1 of x's bits] • Pr^^yly flips 1 of x's bits] 

> (l/n)AS(/)-e(l), 
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where the last inequality holds because at noise rate 1/n, there is constant probability 
that y flips exactly 1 of x's bits, and conditioned on this taking place, the probability that 
f{x) 7^ f{y) is exactly AS(/)/n. Taking e = 1/n in 2) and rearranging, we get 1). ■ 



Proposition 9.2 Let NS(e, d) and GNS^^^ denote the maximum noise sensitivity of a degree 
d PTF in the Boolean and Gaussian domains respectively. For all e and d, we have 

NS(e,c/) > GNS(e,rf). 

Proof: Consider a degree-c? PTF / = sign(p(x)) in the Gaussian setting. We will define a 
sequence of degree-ci PTFs {/ifc}^^ over the Boolean domain. The function hk : { — 1, 1}"'^ — *■ 
{ — 1, 1} is on nk input bits {yl^^i G [n], j G [k]} and is given by, 

T / (1) (2) dcf . / {12j(i[k]yi^ Y^je[k]y2^ Hj(i[k]y^^ 

hkivl \ . . . , CO = sign I p I , , . . . , ' ^ 

By the Central Limit Theorem, the normalized sum — ' of k independent random 
values from { — 1, 1}, tends to in distribution to the normal distribution A/'(0, 1) as k oo. 
Intuitively, this implies that as oo, among other things the Boolean noise sensitivity 
of hk approaches the noise sensitivity of /. However, since hk is a Boolean PTF its noise 
sensitivity is bounded by NS(e, d). 

We now present the details of the above argument. Consider the random variables y = 
{yi, . . . , yn), y = {yi, • • • , yn) G {—1, l}*^ generated by setting each yi to an uniform random 
value in {—1, 1} and yi as 



Vi 



{yi with probability 1 - 

uniform value in {—1, 1} with probability e. 



It is clear that E[yjyj] = 1 — e for all i G [n] and all other pairwise correlations are 0. Let 
{{y^^\y^^^), . . . , {y^''\y^''^)} be k independent samples of {y,y). By definition of Boolean 
noise sensitivity, 

NS.ihk) = Pr[hk{y) ^ hk 
= Pr 



Vk J \ Vk 

Let X ~ A/"", z ~ A/""" be independent and let x = ax + Pz, with a = 1 — e and /? = -\/2e — e^. 
By the Multidimensional Central Limit Theorem [Fel68], as A; — > oo we have the following 
convergence in distribution, 

Vk ' Vk ' ^ ' ^ 
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Since the function a(x, x) = p{x) ■ p{x) is a continous function we get 



lim NS,(/ifc) = lim Pr 

fc— >oo k^oo 



Prx,x[p(a;)p(x) < 0] 
GNS,(/) 



and the resuh is proved. 
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A Basics of Hermite Analysis 



Here we briefly review the basics of Hermite analysis over under the distribution TV". The 
reader who is unfamihar with Hermite analysis should note the many similarities to Fourier 
analysis over {— 1, l}*^. 

We work within L^(M", A/""), the vector space of all functions / : M" ^ R such that 
Fi^^j\fn[f [x)"^] < oo. This is an inner product space under the inner product 

{f,g)= E [f{x)g{x)]. 

This inner product space has a complete orthonormal basis given by the Hermite polynomials. 
In the case n = 1, this basis is the sequence of polynomials 



ho^x) = 1, hi{x) = X, h2{x) 



X 



V2 



h^{x) 



X — ?>x 



hj{x) 



-X-' 



(j-0)!0!20 (j-2)!l!2i 
which may equivalently be deflned by 



hj (x) 



X-' 



(j-4)!2!22 



X 



J-4 



(j-6)!3!2^ 



-X- 



i"6 



-r 



rf!exp(-xV2) dx^ 



exp(— x^/2). 



We note that hd{x) is a polynomial of degree d. For general n, the basis for LF' {W^ M"") is 
formed by all products of these polynomials, one for each coordinate. In other words, for 
each n-tuple S* G N*^ we deflne the n-variate Hermite polynomial Hs : M" — M by 

n 

Hs{x) = \[hsXx^)■. 

i=l 

then the collection {Hs)s&N^ is a complete orthonormal basis for the inner product space. 
By orthonormal we mean that 



1 if 5 = T, 
liS^T. 



By complete, we mean that every function f E L"^ can be uniquely expressed as 

/(x) = fiS)Hs{x), 

where the coefficients f{S) are real numbers and the infinite sum converges in the sense that 

2- 



lim E 



d—>oo 



fix) - X] csHsix) 



\S\<d 



0; 
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here we have used the notation 

n 

i=l 

which is also the total degree of Hs{x) as a polynomial. 

We call f{S) the S Hermite coefficient of f. By orthonormality of the basis {Hs)s£Af", we 
have the following: 

fiS) = {f,Hs) = E[fix)Hsix)]; 

II/II2 = (/, /) = E ^(^)' ("Parseval's identity"); 

Sen" 

{f,9)=Yl f(S)9iS) ( "Plancherel's identity" ) . 
In particular, if / : R'^ ^ {-1, 1}, then J2s fi^? = 1- 

Using the definition of influence from Section 2.1, it is not difficult to show that for any 
/ : M'^ ^ M and any z G [n], we have GIi(/) = Es:5,>o fi^? (see e.g. Lecture 4 of [Mos05]). 



40 



